AMRITA_CEN @ FIRE 2015: Extracting Entities for Social Media Texts in Indian Languages

نویسندگان

  • M. Anand Kumar
  • Shriya Se
  • K. P. Soman
چکیده

This contemporary work is done as a slice of the shared task on Entity Extraction from Social Media Text Indian Languages in Forum for Information Retrieval and Evaluation (FIRE2015). Nowadays people are extensively using social media platforms like Face book, Twitter, etc, to exchange their thoughts. The twitter messages are growing rapidly and their style and short nature present a new challenge in language technology field. This extensive amount of textual data is also increases the interest in Information Extraction (IE) on such textual data. Named entity extraction is one of the essential tasks in Information Extraction, aims to extract and classify entities from text. The performance of the present standard language processing tools is severely affected on Tweet messages. Hence, different improvised and nonimprovised algorithms are necessary for extracting these entities from the informal text. This paper deals with the extracting the Named Entities from twitter messages of four Indian Languages. The extraction of the Named entity relies mainly on the domain specific features and conventional features. A well known supervised algorithm, Support Vector Machine (SVM) is used to extracting the entities. CCS Concepts • Theory of computation~Support vector machines • Computing methodologies~Natural language processing • Information systems~Information extraction • Human-centered computing~Social tagging systems

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ESM-IL: Entity Extraction from Social Media Text for Indian Languages @ FIRE 2015 - An Overview

Entity recognition is a very important sub task of Information extraction and find its applications in information retrieval, machine translation and other higher Natural Language Processing (NLP) applications such as co-reference resolution. Entities are real world elements or objects such as Person names, Organization names, Product names, Location names. Entities are often referred to as Nam...

متن کامل

Vira@FIRE 2015: Entity Extraction from Social Media Text Indian Languages (ESM-IL)

In this paper we have tried to identify and extract “Named Entities” from social media text using conditional random field(CRF) [3]. The paper represents our working methodology and result on Entity Extraction from Social Media Text Indian Languages task of FIRE-2015. We have extracted named entities from two languages Hindi and English. Named Entity Extraction system is implemented based on CR...

متن کامل

Entity Extraction from Social Media Text Indian Languages (ESM-IL)

This paper shows the implementation of named entity recognition (NER) which is one of the applications of Natural Language Processing and is regarded as the subtask of information retrieval. NER is the process to detect Named Entities (NEs) in a document and to categorize them into certain Named entity classes such as the name of organization, person, location, sport, river, city, country, quan...

متن کامل

Named Entity Recognition for Code Mixing in Indian Languages using Hybrid Approach

Automating the process of Named Entity Recognition has received a lot of attention over past few years in Social Media Text. Named Entities are real world objects such as Person, Organization, Product, Location. Identifying these entities in social media text is an important challenging task due the informal nature of text present on social media. One such challenge that is faced in recognizing...

متن کامل

AmritaCEN_NLP @ FIRE 2015 Language Identification for Indian Languages in Social Media Text

The progression of social media contents, similar like Twitter and Facebook messages and blog post, has created, many new opportunities for language technology. The user generated contents such as tweets and blogs in most of the languages are written using Roman script due to distinct social culture and technology. Some of them using own language script and mixed script. The primary challenges ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015